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Abstract 


The nature of categories and their acquisition is one of the cen- 
tral open questions in Cognitive Science. We suggest that cate- 
gories are represented via structured descriptions and formed by 
a process of progressive abstraction, through successive com- 
parison with incoming exemplars. This paper describes how 
SEQL (Skorstad, Gentner, & Medin, 1988), a computer model 
for category learning, which is based on SME (Falkenhainer et 
al 1986, 1989; Forbus et al 1994) can be used to simulate a 
recent categorization experiment (Ramscar & Pain, 1996), using 
a new algorithm, Generalization and Exemplar Learning 
(GEL). We demonstrate that SEQL produces behavior consis- 
tent with human subjects. 


Introduction 


Similarity is often viewed as central to categorization. 
For instance, prototype theories of categorization posit that 
categorization decisions are made on the basis of the 
similarity of an entity to the prototypical member of that 
category (Rosch 1975). However, similarity-based accounts 
have been criticized recently on the grounds that they fail to 
capture the role of theory-based knowledge in category for- 
mation (Murphy & Medin, 1985; Murphy & Allopenna 
1994; Wisniewski, 1995). Indeed, subjective similarity can 
sometimes be disassociated from the probability of category 
membership (Keil, 1989; Rips, 1989; Gelman & Wellman, 
1991). For example, bats share many perceptual and behav- 
ioral characteristics with birds (e.g., they both fly). Yet de- 
spite these similarities, we do not categorize bats as birds. 
Rather, bats are categorized as mammals because of nonob- 
vious but theoretically central properties such as giving birth 
to live young. 

This research attempts to bridge the gap between similar- 
ity and categorization. We suggest that many of the prob- 
lems with similarity-based accounts stem from viewing simi- 
larity in terms of featural commonalities (see Goldstone, 
1994). We agree that theory-based knowledge is important 
to categorization, but we do not see that as inimical to a role 
for similarity, provided similarity is modeled appropriately. 
Recent studies have provided evidence that the process of 


structural alignment (Markman & Gentner, 1993; Medin, 
Goldstone & Gentner, 1993) is a part of the category learn- 
ing process (Lassaline & Murphy, 1998; Ramscar & Pain, 
1996). These studies have shown that similarity, when un- 
derstood as the result of an alignment process, is capable of 
incorporating theory-based knowledge and higher order rela- 
tions into the process of category learning (Gentner & Me- 
dina, 1998). 

This paper describes how a computer model for category 
learning, SEQL (Skorstad, Gentner & Medin, 1988), which 
uses structural alignment to incrementally compute abstrac- 
tions, can be used to simulate recent results in category 
learning. We begin by outlining structure-mapping theory 
and evidence that structural alignment plays an important 
role in categorization. We then describe a progressive ab- 
straction model of category learning, and the GEL (Gener- 
alization and Exemplar Learning) algorithm (see also Blok 
& Gentner, 2000) which implements it in SEQL. Finally, 
we show how the results of a recent study exploring the role 
of structural alignment in category learning (Ramscar & 
Pain, 1996; Darrington, Lingstadt, & Ramscar, 1998) can be 
simulated using SEQL. 


Structural alignment and 
category-based inference 


Structure-mapping theory (Gentner, 1983, 1989) provides 
an account of analogy and similarity based on comparisons 
of structured representations. According to  structure- 
mapping theory, structural alignment takes as input two 
structured representations (base and target) and produces as 
output a set of mappings. Each mapping consists of a set of 
correspondences that align items in the base with items in 
the target and a set of candidate inferences, which are sur- 
mises about the target made on the basis of the base repre- 
sentation plus the correspondences. The constraints on the 
correspondences include structural consistency, i.e., that 
each item in the base maps to at most one item in the target 
and vice-versa (the /:/ constraint) and that if a correspon- 
dence between two statements is included in a mapping, then 
so must correspondences between its arguments (the parallel 
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Figure 1: The GEL strategy as implemented in SEQL 


connectivity constraint). Which mapping is chosen is gov- 
erned by the systematicity constraint: Preference is given to 
mappings that match systems of relations in the base and 
target. Each of these theoretical constraints is motivated by 
the role analogy plays in cognitive processing. The 1:1 and 
parallel connectivity constraints ensure that the candidate 
inferences of a mapping are well-defined. The systematicity 
constraint reflects a (tacit) preference for inferential power 
in analogical arguments. 

The structure-mapping view offers new explanations rele- 
vant to categorization. For example, there is evidence that 
humans prefer matches that share common higher-order rela- 
tions over those that share only object-attributes (Gentner, 
Rattermann & Forbus, 1993) and seek systems of common- 
alities rather than over isolated commonalities when carry- 
ing out comparisons (Clement & Gentner, 1991). These 
tacit preferences would contribute to theory-based biases in 
categorization. Ramscar and Pain (1996) tested this possibil- 
ity by asking subjects to sort the Gentner, Rattermann & 
Forbus stories (which varied systematically in their similar- 
ity relations) into categories of their own choosing. As dis- 
cussed below, subjects classified the stories by common 
causal structure rather than by common object features. 


Progressive abstraction via GEL 


We propose a model of category learning by progressive 
abstraction. On this account, the representation of a category 
is a structured description generated by successive compari- 
sons with incoming exemplars. The idea is that a set of gen- 
eralizations and a set of exemplars are maintained. Each 
generalization is regarded as a separate category. The ex- 
emplars are items that are too dissimilar to the existing cate- 
gories to be assimilated into any of them. As each new ex- 
emplar E arrives, it is first compared with the existing gen- 
eralization(s) G;, using structural alignment. If the match is 
good enough (as compared to a preset threshold) then E is 
assimilated into G; by replacing G; with the structural over- 
lap. If no generalization is sufficiently similar, then E is 
compared with each stored exemplar E;. If one of those 
matches is over threshold, then their overlap becomes a new 
generalization, and E; is removed from the stored exemplars. 
Otherwise, E is added to the set of stored exemplars. This, 


intuitively, is the Generalization and Exemplar Learning 
algorithm. 

Thus GEL works by constantly folding new exemplars 
into an evolving category representation. It is intended to 
capture the incremental, conservative nature of concept 
learning. Early learning is typically highly contextualized, 
with many specific, concrete details (Gentner, 1989; Medin 
& Ross, 1989). With experience successive comparisons 
among exemplars weather away the concrete details and 
eventually reveal the abstractions beneath. 

SEQL (Skorstad, Gentner, & Medin, 1988) provides a 
platform for implementing this algorithm. SEQL was de- 
signed as a cognitive simulation toolkit that could be used to 
implement a variety of schema abstraction strategies, rang- 
ing from pure exemplar to pure prototype strategies, as well 
as mixed strategies. The 1988 paper demonstrated that a 
model involving abstraction based on successive compari- 
sons could model effects due to presentation order in con- 
cept learning. Figure | shows the architecture of SEQL. 


SEQL uses a simulation of structural alignment as a com- 
ponent both for simulating category-based inference and for 
the comparison process used in category learning. The 
Structure-Mapping Engine (SME) (Falkenhainer et al. 1986, 
1989; Forbus et al. 1994) is a cognitive simulation of ana- 
logical matching. Given base and target descriptions, SME 
finds globally consistent interpretations via a local-to-global 
match process. The base description is either an exemplar 
from the list of stored exemplars (or the first exemplar of the 
input sequence) or a stored generalized description. The 
target description is always a new exemplar from the input 
sequence. SME begins by proposing correspondences, 
called match hypotheses, in parallel between statements in 
the base and target. Then, SME filters out structurally in- 
consistent match hypotheses. Mutually consistent collections 
of match hypotheses are gathered into global mappings us- 
ing a greedy merge algorithm. An evaluation procedure 
based on the systematicity principle is used to compute the 
structural evaluation for each mapping. 

We will need a few conventions to describe the algorithm 
underlying SEQL and GEL. The overlap found when com- 
paring two descriptions is defined as the set of statements in 
the base which have correspondences in the best mapping 
SME produces for those two descriptions. To determine if 


an exemplar and a concept description are “sufficiently simi- 
lar’, we need to use the notion of structural evaluation to 
provide a numerical summary of similarity that is independ- 
ent of the size of the descriptions. Previous cognitive simu- 
lation studies using SME (c.f. Falkenhainer, Forbus, & 
Gentner, 1989) have demonstrated the structural evaluation 
of the strongest mapping to be correlated with such judg- 
ments. Consequently, let SE(b,t) be the structural 
evaluation score of the best mapping SME finds for the 
comparison between descriptions b and t. Our numerical 
summary of similarity is defined as follows: 


NSIM(d,e) = SE(d,e) /SE(d,qd) 


that is, the structural evaluation score of the concept de- 
scription d compared with e, normalized by the score of the 
concept description compared to itself. This limits NSIM to 
being between 0 and 1, which simplifies the comparison 
between the structural evaluation score and a threshold 
value. We use a threshold T to decide whether or not a 
match between an exemplar and a concept description is 
considered “good”, i.e. sufficiently similar: 


NSIM (d, e) 2T 


The threshold T determines how conservative the system 
will be: If T = 1.0, then no abstraction will occur, since only 
perfect matches would be grouped. If T = 0.0, then any two 
descriptions could match, leading quickly to an empty de- 
scription as the concept representation. 


The Generalization Algorithm 


If the base and target are judged sufficiently similar by the 
process just described, a generalization process will trans- 
form the most promising’ mapping between the two descrip- 
tions into a new generalized description. The generalization 
algorithm creates a new description by copying the predicate 
structure in the overlap of the mapping, substituting generic 
labels for particular entity names but preserving the predi- 
cate structure, including the specific predicate names used’. 


The GEL algorithm 


The GEL strategy is based on two observations: (1) Natu- 
ral sequences of input stimuli may consist of examples from 
several different categories. Furthermore, some natural con- 
cepts involve disjunctive descriptions. This suggests main- 
taining multiple generalizations. In the case where all input 
stimuli are assumed to be from the same category (e.g., su- 
pervised learning), the set of generalizations serves as a dis- 
junctive concept definition. Otherwise, each generalization 
represents a distinct category. (2) Exemplars that are ini- 
tially singular may only be the first of many similar exem- 
plars to come. This suggests maintaining a list of singular 


; Generally SME produces more than one mapping for a com- 
parison. SEQL only considers the mapping with the highest struc- 
tural evaluation score. 

> In the case of matches involving non-identical functions, the 
function in the base description is used. 


exemplars which currently do not fit any existing generaliza- 
tion well. Singular exemplars are transformed into generali- 
zations if a later exemplar matches them best. 


Generalization and Exemplar Learning (GEL): A concept 
description consists of G, a list of generalizations, and S, a 
set of exemplars. Elements of S are structured descriptions, 
and elements of G are tuples of a structured description and 
an integer N indicating the number of exemplars assimilated 
into that generalization. If nothing is known about a concept, 
G and S are initially empty. When a new exemplar E; ar- 
rives, the following occurs: 


Procedure GEL (E;, G, S, T) 
1. For each <G;,N;> € G, 
a. If NSIM(G;,,E;) > T then <G;,N;> < <Gen- 
eralization(G;,E;), Ni+1>; go to 4. 
2. ForeachS;¢ S, 
a. If NSIM(S;,E;) > T then let Gnew = Gen- 
eralization(S;,E;); 
GeGu {<Gnew,2>}; S — S — {Sj}; go 
to 4. 
3. SeSu {EF}. 
4. Sort G by N;’s so that it will be searched from 
maximum to minimum number of exemplars in- 
volved. 


The sort step improves the likelihood that good generali- 
zations will be found, if they exist, by preferentially building 
up strong existing generalizations. The assimilation of sin- 
gular examples in step 2 serves a similar purpose. 


A Case Study 


Ramscar and Pain (1996) investigated the role of struc- 
tural similarity in categorization by having subjects sort sto- 
ries taken from the “Karla the Hawk” similarity experiments 
(Gentner, Rattermann, & Forbus, 1993). Each set had six 
stories: a base plus five variants. All variants except OO 
have first-order relational commonalities with the base (..e., 
common events: e.g., chase/pursue). In addition, some vari- 
ants share object attributes (i.e., similar entities: e.g., 
hawk/eagle). In addition, some share higher-order relations 
such as common causal structure. The following relation- 
ships hold: 


B, the base story. The properties of the other stories 
in the set are defined with respect to this story. 

LS, the Jiterally similar story, shares all levels with 
the base: object attributes, first-order events and higher- 
order relational structure.. 

TA, the analogy story, shares first-order and higher- 
order relations with the base, but not object attributes. 

MA, the mere-appearance story, shares first-order re- 
lations and object attributes with the base, but not 
higher-order relational structure. 

FA, the false analogy story, shares only first-order re- 
lations with the base. (It is an analog of the MA story) 

OO, the objects-only story, shares only object attrib- 
utes with the base. 


Total 


Ramscar and Pain gave subjects each 10 sets of six sto- 
ries, one set at a time. Stories within a set were presented in 
randomized order, and the order of sets of stories was also 
randomized. For each set, subject read every story in the set 
to familiarize themselves with the stories, and then were 
asked to “group the stories into the categories that seem 
most natural and appropriate to you. These groups can 
range from putting every member of the story set into the 
same group, to putting each story into a group on its own.” 

Ramscar & Pain (1996) identified ten types of groupings, 
or which only five exceeded 3% responding. The two most 
common groupings were based on a shared relational struc- 
ture, with the base either included (B-LS-TA, FA-MA, OO - 
- 79.5%) or separately classified (B, LS-TA, FA-MA, OO -- 
8%). The next two groupings were based on shared first- 
order relations (B-LS-TA-FA-MA, OO -- 4%) or on object 
similarities (5%) -- either B-LS-MA-OO, FA-TA or B-OO, 
LS-MA, FA-TA. 

Overall, subjects showed a strong preference to group sto- 
ries that shared systematic relational structure, consistent 
with the predictions of structure-mapping theory; 87.5% of 
the groupings were based on common first-order and higher- 
order relations. 
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Figure 2: Simulation Results. Threshold values (x axis) are 
expanded towards the right to focus on key behavior 


Simulation 


Can GEL and SEQL capture this categorization behavior? 
To test this, we used representations of eight of the “Karla 
the Hawk” story sets that had been generated for an earlier 
study (Forbus, Gentner, & Law, 1995). These representa- 
tions were generated before the Ramscar and Pain experi- 
ment, so that the choices made in creating the representa- 
tions were made independently. Each set consists of five 
stories, a base (B), a literal similarity (LS), a true analogy 
(TA), a false analogy (FA), and a mere appearance (MA) 
story. (The object-only (OO) stories had not been repre- 
sented in the previous experiment.) For each of the eight sets 
of stories, we ran SEQL on every possible sequences (i.e., 5! 
= 120). We identified five types of groupings: 


Higher-order Relational Structure (HOR): All 
groupings based on shared structural relations. This in- 
cludes Ramscar and Pain’s top two groupings (B-LS- 
TA, MA-FA and B, LS-TA, MA-FA), as well as group- 


ings in which MA and FA are separated, e.g., B-LS-TA, 
FA, MA and B, LS-TA, MA, FA. 

Literal Similarity and Base (LS) (Conservative 
Similarity): All groupings in which only the literal 
similarity story is grouped with the base story and the 
true analogy story is kept separate, i.e. B-LS, TA, MA- 
FA and B-LS, TA, MA, FA. 

ALL: The grouping that includes all stories, i.e. B- 
LS-TA-MA-FA. 

SEP: No groupings at all; each story separate. 

Other: All other unclassifiable groupings. 


The simulation results are shown in Figure 2. The x axis 
shows the threshold value and the y axis the percentage of 
groupings found at that value. Each data point represents the 
average of 960 simulation runs (120 sequences * 8 sets). For 
the whole graph (14 threshold values) we ran a total of 
13,440 sequences, which required more than 60,000 SME 
matches. 

The first thing to notice is the range of patterns that results 
as the threshold is varied. As we move from low to high 
thresholds the pattern changes from indiscriminate lumping 
of all stories together on the left (Ayper-generalization) to 
finicky separation of each story into each own category at 
the extreme right (hyper-discrimination). Below T = 0.85, 
the majority of groupings produced are of the ALL type 
(hyper-generalization), and when T > 0.99, the stories are 
grouped into individual categories. 

In the threshold range between 0.85 to 0.98 we see a pat- 
tern that matches well with human results: There is a strong 
preference for groupings based on common higher-order 
relational structure. At 0.85 the number of groupings based 
on higher-order relations reaches about 70 percent. At the 
same time, the number of unclassifiable Other groupings 
drops below 10 per cent. Higher-order groupings remain 
dominant until the threshold becomes extremely high. As we 
move to the upper bound of the humanlike range, there is 
first a rise in the more conservative LS groupings. Finally, 
as the threshold approaches 1.0, almost no stories will be 
grouped together, leading to a total separation — each story 
will form its own category 

A comparison of our results with the study by Ramscar 
and Pain would suggest that their data correspond to a 
threshold value around 0.95 in our simulation. The only ma- 
jor difference is that their human subjects did not produce as 
many ALL groupings (comparable to Ramscar and Pain’s 
Type 3 groupings) as SEQL does it in our experiment. 


Order Effects 


According to the progressive alignment model, 
there should be effects of the order in which the items are 
received. Assuming a moderate threshold, SEQL’s new 
categories will initially be highly conservative -- closely 


The rising number of LS groupings for threshold values be- 
yond 0.95 stems entirely from differentiating previous HOR group- 
ings. (The groupings for each threshold value are mutually exclu- 
sive.) 


identified with the first few instances that comprise them. 
Subsequent comparisons will lead to further abstractions and 
‘wear away’ specific features. Thus the generalization will 
gradually come to match a wider range of new items. This 
means that the overlap among the first few exemplars deter- 
mines the initial encoding of the category. Thus we should 
be able to manipulate the likelihood that SEQL (or human 
subjects) arrives at the HOR grouping by varying the initial 
exemplars. 

In further studies we tested the progressive alignment pre- 
diction (Quinn et al., in preparation). We ran a simulation 
contrasting structure-promoting orders with  surface- 
promoting orders. A structure-promoting order is one in 
which stories that overlap in relational structure: e.g., (B, 
LS, TA) occur first, and a surface-promoting order is one in 
which stories with common attribute matches (LS, MA, FA) 
occur first. The results of the simulation showed that on 
structure-promoting orders, SEQL produced structure-based 
groupings 90% of the time. In contrast, for surface- 
promoting orders, SEQL produced structure-based group- 
ings only 70% of the time. 

Thus early presentation of structural or surface common- 
alities will bias the kinds of categories formed. This result 
is consistent with the prior findings of order effects in hu- 
man category learning (Elio & Anderson 1984; Watten- 
maker 1993; Medin & Bettger 1994). However, to achieve 
a closer test, Quinn, Kuehne, Gentner and Forbus (in prepa- 
ration) ran human subjects on a serial categorization study 
contrasting structure-promoting and surface-promoting or- 
ders.* Subjects received the stories one at a time on a work- 
station, and were allowed to see each story only once. After 
reading the five stories in each set, participants were asked 
to group them into categories that seemed “most natural and 
appropriate.” Half the subjects received the stories in struc- 
ture-promoting orderings — e.g., {B, LS, TA, FA, MA}. The 
other half received them in surface-promoting orders — e.g., 
{MA, LS, TA, FA, B}. (Four variants were used in each 
condition) 

People produced more structural grouping in the struc- 
ture-promoting orders (88.9%) than in the  surface- 
promoting orders (71.4). The corresponding proportions for 
SEQL were 85.7 and 71.4. Interestingly, when subjects were 
required to write out justifications for their groupings, their 
percentages of structural groupings increased (to 92.9 and 
79.8 in structure-promoting and surface-promoting orders, 
respectively). Although these results bear out our predictions 
concerning order, there was also a discrepancy: unlike hu- 
man subjects, SEQL did not produce surface groupings 
when given the surface-promoting orders; instead it pro- 
duced various ‘other’ groupings. This suggests that the ob- 
ject representations we used in our simulation were not as 
rich as human representations. Forbus and Gentner (1989) 
specificity conjecture suggests that more low-order informa- 


* Seven of the Karla the Hawk story sets were used (the same 
sets used by Ramscar and Pain (1996)). As in the above simula- 
tions, each story set contained a base (B) story, a literal similarity 
(LS) story, a true analogy (TA) story, a mere appearance (MA) 
story, and a false analogy (FA). 


tion would lead to SEQL more closely modeling the human 
data. 


Discussion 


A growing number of studies suggest that structural align- 
ment plays a central role in categorization. This paper 
shows that a simulation based on progressive abstraction via 
successive comparisons can model some of the recent hu- 
man results in this area. The Generalization and Exemplar 
Learning strategy for SEQL presented here provides a com- 
putational model for progressive abstraction. In our ex- 
periments we have shown that SEQL using GEL can pro- 
duce results that are consistent with human categorization 
behavior. By using SME, a simulation of structure-mapping 
theory that already has considerable support in terms of psy- 
chological plausibility (c.f. Gentner & Markman, 1997), 
SEQL extends the computational account that mirrors the 
psychological evidence that implicates structural alignment 
in categorization. 

SEQL’s behavior in the ‘moderate threshold’ range fits 
fairly well with that of Ramscar and Pain’s human subjects. 
What about its behavior when given more extreme thresh- 
olds? We speculate that these kinds of threshold effects oc- 
cur in human similarity as well. For example, a person iden- 
tifying her own door key sets a higher match threshold than 
a person who is simply asked to identify an object as a key. 
In the other direction, a person who simply needed a small 
metal object (say, to weight down a transparency sheet) 
would apply a more liberal match criterion -- a key or a coin 
would be equally fitting. 

Although SEQL provides an interesting model for some 
aspects of categorization, it has several important limita- 
tions. There is no recovery or limiting mechanism that pre- 
vents successive generalizations from becoming so abstract 
that they have no inferential power. The use of a high simi- 
larity threshold makes such vacuous abstractions less likely, 
but what is needed is a better understanding of when and 
how such outcomes are (or are not) avoided in learning. 
SEQL also does not model the effects of learning contrasting 
categories. Nevertheless, SEQL does suggest even in its 
current state that some of the mysteries of categorization 
may be solved by further explorations of progressive ab- 
straction via structural alignment. 
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